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SUMMARY 


As< 


Several  algorithms  have  been  proposed  for  the  computation  of  maximum 
likelihood  estimates  for  contingency  tables.  Since  multinomial  logit 
response  models  can  be  treated  as  special  versions  of  loglinear  models, 
many  of  these  techniques  can  be  used  for  logit  models  as  well.  In  this 
paper  we  compare,  in  a  qualitative  fashion,  the  relative  merits  of  (i)  two 
variants  of  Newton's  method  developed  by  Fienberg  and  Stewart  (ii)  GLIM, 
as  developed  by  Nelder  and  Wedderburn  (ill)  the  BMDP  program  for  stepwise 
logistic  regression,  and  (lv)  the  widely  used  method  of  iterative  pro¬ 
portional  fitting. 
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1.  INTRODUCTION 

The  analysis  of  cross-classified  categorical  data  involves  statistical 
problems  where  both  the  explanatory  variables  (or  factors)  and  response 
variables  are  categorical.  Loglinear,  logit  and  multinomial  logit  response 
models  are  often  recommended  for  the  analysis  of  such  data  (e.g.  see 
Bishop,  Fienberg  and  Holland,  1975;  Bock,  1975,  Fienberg,  1977,  Haberman, 
1974,  1978).  Since  logit  models  are  also  loglinear  models,  the  computational 
methods  associated  with  fitting  loglinear  models  can  also  be  used  for  logit 
models;  however,  it  is  often  more  efficient  to  use  techniques  which  are 
specially  designed  for  analyses  using  logit  or  multinomial  logit  response 
models. 

In  this  paper  we  compare  four  different  computational  approaches  for 
maximum  likelihood  estimation  of  parameters  and  expected  cell  values  in 
logit  models; 

(a)  a  variant  on  Newtons's  method,  as  developed  by  Fienberg  and 
Stewart  (1979),  applied  in  somewhat  different  forms  for  the 
loglinear  and  logit  formulations, 

(b)  iteratively  reweighted  least  squares,  as  implemented  in 
GLIM  (see  Nelder  and  Wedderburn,  1972), 

(c)  Newton's  method,  for  the  logit  model,  as  used  in  BMDPLR 
(see  Jennrich  and  Moore,  1975), 

(d)  iterative  proportional  fitting  (IFF). 

Additional  comparisons  could  be  made  with  the  Newton-Raphson  formulation 
of  Bock  (1975)  or  Haberman  (1978),  but  are  not  included  here. 

2.  LOGLINEAR.  LOGIT  AND  MULTINOMIAL  LOGIT  RESPONSE  MODELS 

Consider  a  problem  involving  a  response  variable  with  K  categories 
and  two  explanatory  variables,  with  I  and  J  categories  respectively. 


The  data  are  counts  in  the  form  of  an  ixjxg  table  where  the  totals  in 
the  ix j  margin,  corresponding  to  the  explanatory  variables,  are  taken 
as  fixed.  We  assume  the  sampling  model  for  the  counts  is  product-multinomial 
(Bishop,  Fienberg  and  Holland,  1975).  Multinomial  logit  response  models, 
involving  K-l  simultaneous  logit  equations,  are  equivalent  to  loglinear 
models  that  treat  the  three  variables  as  responses  but  Include  in  the  model 
all  terms  corresponding  to  main  effects  and  interactions  for  the  explanatory 
variables  (Fienberg,  1977,  Chapter  6).  The  special  case  of  102  yields 
the  familiar  logit  model.  By  using  this  correspondence,  any  algorithm 
for  fitting  loglinear  models  can  be  used  for  the  logit  problem.  The  dis¬ 
advantage  to  such  an  approach  is  the  added  number  of  (unnecessary)  para¬ 
meters  in  the  model.  In  the  following  sections  we  discuss  some  of  the 
properties  of  various  algorithms  and  their  implementation.  The  discussion 
is  briefly  summarized  in  Table  1. 

2.1  NEWTON'S  METHOD 

Fienberg  and  Stewart  (1979)  have  used  a  variant  of  Newton's  method 
for  analyzing  both  loglinear  and  multinomial  logit  response  models.  Their 
first  algorithm  treats  the  logit  problem  as  a  loglinear  model.  The  estimated 
covariance  matrix  for  the  parameters  is  adjusted  for  the  required  conditioning 
at  the  end  of  the  computations  using  formulae  available  from  Haberman  (1974). 
Their  second  algorithm  proceeds  by  initially  conditioning  on  the  explana¬ 
tory  variables,  and  then  using  a  somewhat  different,  but  closely  related, 
set  of  computations.  The  choice  between  the  algorithms  depends  upon 
the  number  of  categories  and  structure  of  the  response  variables.  The 
second  approach  should  be  more  efficient  when  K  is  large  or  there  are  many 
response  variables.  In  other  words,  if  the  multinomial  logit  response 
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model  has  considerably  fewer  parameters  than  its  loglinear  equivalent, 
then  one  should  initially  condition  on  the  explanatory  variables. 

Both  of  these  algorithms  involve  the  construction  of  the  upper  half 
of  a  p*p  weighted  cross  product  matrix,  where  p  is  the  dimension  of 
the  design  manifold.  Note  that  p  is  not  the  same  for  the  two  approaches. 
In  both  algorithms  a  sparse  n  xp  design  matrix  (where  n  corresponds  to 
the  number  of  cells  in  the  cross-classification)  is  generated  internally, 
but  is  never  actually  stored  during  the  computations.  The  calculations 
proceed  via  Newton's  method  with  variable  step  length,  using  a  Cholesky 
decomposition  with  pivoting.  Extrinsic  aliasing  (i.e.,  non-identif lability 
of  certain  parameters)  is  detected  by  small  pivot  elements  during  the 
decomposition.  As  the  internal  parameterization  is  not  readily  interpreted, 
u-terms  or  other  desired  parameterizations  are  estimated  by  direct  compu¬ 
tations  on  the  table  of  fitted  values.  In  this  way  it  is  not  necessary 
to  address  the  statistical  consideration  of  marginallty  (see  discussions 
in  Nelder,  1976  and  1977,  and  in  Fienberg,  1977)  until  the  end  of  the 
computations. 

2.2  GLIM 

The  GLIM  algorithm,  as  developed  by  Nelder  and  his  colleagues,  is 
designed  for  analyzing  Generalised  Linear  Models  (see  Nelder  and  Wedderburn, 
1972).  Logit,  multinomial  logit  response,  and  loglinear  models  are  all 
encompassed  in  the  family  of  generalised  linear  models,  however  only  logit 
and  loglinear  models  are  easily  fitted  in  GLIM.  In  order  to  fit  a  K  >  2 
level  response  variable  in  GLIM,  the  user  can  either  fit  the  associated 
loglinear  model  or  treat  the  problem  in  an  asymmetric  fashion,  e.g.,  by 
the  use  of  continuation  ratios  (see  Fienberg,  1977,  Chapter  6).  It  is  also 


possible  to  fit  the  multinomial  logit  response  model  directly  by  using 
GLIM's  macro  and  user-defined  model  capabilities.  This  approach  requires 
considerable  storage  and  numerical  sophistication  on  the  part  of  the  user. 
For  many  problems  fitting  the  loglinear  model  is  perfectly  satisfactory, 
but  for  some  problems  the  number  of  parameters  becomes  too  large  for  the 
program  to  handle.  The  asymptotic  covariance  matrix  is  not  adjusted  for 
the  appropriate  conditioning  in  this  approach  for  multinomial  logits. 

GLIM  uses  Gaussian  elimination  to  sweep  out  the  rows  of  a  weighted 
cross-product  matrix.  In  order  to  preserve  the  marginality  constraints 
(l.e.,  due  to  the  restriction  to  hierarchial  models)  there  is  no  pivoting 
during  the  Gaussian  elimination.  As  indicated  in  the  previous  sub-section, 
marginality  considerations  can  be  satisfactorily  addressed  at  the  end  of 
the  computational  problem,  and  the  good  numerical  properites  of  pivoting 
could  have  been  utilized  during  the  Gaussian  elimination.  GLIM  does  not 
do  this.  Extrinsic  aliasing  of  parameters  is  detected  when  a  diagonal 
element  of  the  weighted  cross-product  matrix  drops  by  more  than  10  ^ 
in  two  successive  iterations.  This  procedure  may  encounter  numerical 
instabilities,  particularly  when  the  program  is  used  for  logistic  regres¬ 
sion  problems. 

2.3  BMDP 

There  are  at  least  two  methods  of  fitting  logit  models  in  BMDP. 

If  the  response  is  binary  the  stepwise  logistic  regression  program  BMDPLR 
may  be  used.  Otherwise  the  iterative  proportional  fitting  (IFF)  algorithm 
for  loglinear  models,  in  BMDP3F,  is  a  possibility.  We  will  confine  our 
coments  here  to  the  logistic  regression  program,  defering  a  discussion 


of  IP F  to  the  next  section. 


The  algorithm  of  BMDPLR  is  a  specialisation  of  the  BMDP  nonlinear 
regression  program,  and  corresponds  to  iteratively  reweighted  least 
squares.  The  description  by  Jennvich  and  Moore  (1975)  indicates  that 
the  algorithm  proceeds  by  explicitly  inverting  the  weighted  cross-product 
matrix,  using  the  method  of  Gauss-Jordam  elimination.  Aliasing  is  then 
determined  by  small  diagonal  pivots;  however,  it  appears  that  marglnality 
may  be  violated  here.  As  with  GLIM  it  is  possible  to  fit  user  defined 
models  with  BMDP.  In  our  multinomial  logit  response  case,  this  would 
Involve  using  BMDP3R,  the  nonlinear  regression  program,  together  with 
some  Fortran  subroutines.  Again,  considerable  expertise  on  the  part  of  the 
user  is  required. 

Although  it  is  not  our  direct  concern  here,  we  comment  briefly  on 
the  stepwise  aspects  of  this  algorithm.  The  BMDPLR  program  is  the  only 
one  we  reviewed  that  contains  an  automated  selection  procedure.  However, 
such  a  structure  could  easily  be  Implemented  in  the  other  algorithms, 
particularly  GUM,  which  has  a  macro  facility.  Two  stepping  procedures 
for  BMDPLR  are  outlined  in  Dixon  and  Brown  (1979) .  One  is  based  on  the 
likelihood  ratio  and  uses  standard  asymptotic  results  for  its  justification. 

The  other  is  based  on  an  F-statistic  approximation,  using  an  estimate  of 
the  asymptotic  covariance  matrix,  which  may  be  of  practical  use  but  seems 
to  lack  a  theoretical  basis.  The  stepping  procedures  do  preserve  marglnality, 
although  this  option  may  be  overridden. 

2.4  ITERATIVE  PROPORTIONAL  FITTING 

The  Iterative  Proportional  Fitting  algorithm  (IPF)  is  especially  useful  for 
fitting  loglinear  models  because  of  the  elementary  nature  of  the  iterations. 

The  algorithm  does  not  directly  fit  logit  or  multinomial  logit  response 
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models;  these  must  be  fitted  In  their  equivalent  loglinear  forms.  This 
is  not  as  severe  a  restriction  as  it  seems,  since  the  major  storage 
requirement  for  IPF  is  the  table  itself.  Extra  model  parameters  cause  only 
a  very  slight  Increase  in  storage  needs.  Of  course,  the  major  disadvantages 
of  IPF  are  (a)  its  sometimes  very  slow  rate  of  convergence  and  (b)  the 
lack  of  covariance  estimates  for  the  fitted  values.  One  advantage  of 
IPF  is  that  when  maximum  likelihood  estimates  for  the  fitted  values  have 
a  closed  form  (i.e.,  for  decomposable  models),  a  version  of  IPF  will 
obtain  these  in  one  iteration  (see  Bishop,  Fienberg  and  Holland,  1975,  and 
Haberman,  1974,  p.  191).  This  is  not  true  for  any  of  the  Newton  algorithms. 

3.  COMPARISONS 

Some  of  the  qualitative  aspects  of  the  algorithms  in  question  are 
summarized  in  Table  1.  The  GLIM  and  BMDP  programs  have  similar  properties. 
As  GLIM  is  an  Interactive  program,  it  is  easier  to  use  but  pays  the  price 
by  being  able  to  analyze  only  smaller  problems.  (Depending  on  the  imple¬ 
mentation,  the  BMDP  package  can  also  suffer  from  a  lack  of  storage  space.) 
Both  algorithms  consider  the  logit  response  problem  as  a  special  case  of 
logistic  regression. 

The  Fienberg-Stewart  algorithms  have  opted  for  economy  of  storage 
over  efficiency  of  operation.  Thus  with  similar  front-end  programs,  be 
they  Interactively  or  batch  oriented,  one  would  expect  these  algorithms 
to  be  able  to  handle  larger  problems  than  either  GLIM  or  BMDP.  The 
Fienberg-Stewart  algorithms  have  the  advantage  that  they  are  able  to 
directly  fit  multinomial  logit  response  models,  whereas  the  other  approaches 
require  that  the  loglinear  equivalent  of  the  model  be  fitted.  Whether 
this  is  an  advantage  or  not  depends  upon  the  size  of  the  marginal  array 
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corresponding  to  the  explanatory  variables.  When  this  margin  is  small, 
some  advantages  may  accrue  to  the  loglinear  approach. 

We  have  not  made  direct  comparisons  here  on  speed  of  convergence, 
but  we  note  that,  when  IPF  needs  to  iterate  (i.e.,  for  non-decomposable 
models),  it  has  linear  convergence,  while  the  other  algorithms  enjoy 
quadratic  convergence  properties .  We  expect  that  the  "special  features" 
in  the  Fienberg-Stewart  algorithms  should  allow  for  slightly  faster  conver¬ 
gence  than  do  the  GLIM  and  BMD  algorithms,  but  rate  of  convergence  should 
not  be  a  serious  distinction  among  these  algorithms. 
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Both  BMDP  and  GLIM  can  handle  this  problem  by  treating  it  as  a  loglinear  model. 
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qualitative  fashion,  the  relative  merits  of  (i)  two  variants  of  Newton's 
method  developed  by  Fienberg  and  Stewart  (ii)  GLIM,  as  developed  by  Nelder  and 
Wedderburn  (iii)  the  BMDP  program  for  stepwise  logistic  regression,  and  (iv) 
the  widely  used  mpfhnri  of  It pt-aHvo  proportional  fitting. 
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